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EFFICIENCY OF ESTIMATION WHEN THERE IS ONLY ONE COMMON FACTOR 



Abstract 

Explicit formulas are derived for the asymptotic sampling variances and 
covariances of the maximum likelihood estimators for factor-analysis param- 
eters in the special case where there is just one common factor. The effect 
of the number of variables on these variances and covariances is indicated. 

A formula is given showing to what extent the usual covariance between two 
of a set of variables can be estimated more efficiently when there is known 
to be just one common factor. 
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EFFICIENCY OF ESTIMATION WHEN THERE IS ONLY ONE COMMON FACTOR* 



1. Introduction 

Under multinormality assumptions, Lawley ( 1967 ) and Lockhart ( 1967 ) 
give formulas for the inverse of the asymptotic variance-covariance matrix 
of the maximum likelihood estimators for factor analysis parameters. 

Anderson & Rubin (195 6, eq. 12.25) give a complicated formula for the 
asymptotic covariance between two estimated loadings on the same factor. 
Explicit formulas for other asymptotic sampling variances and covariances 
do not seem to be readily available. The present paper gives reasonably 
convenient explicit formulas for these for the special case where there is 
just one common factor. (in this paper, the terns asymptotic and large - 
sample imply a large number of observations, not a large number of variables.) 
It is shown how n , the number of random variables, affects the efficiency 
with which factor loadings can be estimated. 

When the one-factor model holds for a set of n variables, the co- 
variance a. . between any two variables x. and x. can be estimated more 
efficiently than by using the ordinary sample covariance s. . . An expres- 
sion for the increase in efficiency is given. A practical test-theory prob- 
lem that motivated the derivation is briefly considered in the final section. 

2. Formulas for Sampling Variances and Covariances 
The factor analysis model for the one-factor case is 



Z = KA' + f 



( 1 ) 




*This research was sponsored in part by the Personnel and Training 
Research Programs, Psychological Sciences Division, Office of Naval 
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where Z = ||o . .11 is the (population) variance -covariance matrix for n ^ 3 

X J 

observed variables x. , A = {A. ) is the vector of loadings on their 

i 7 i 

common factor, and $ = ||^\ .11 is a diagonal matrix. It will be assumed 

that f >e>0, i = 1,2, . ..,n , where e is some small quantity, 
ii 

For the one-factor case, ignoring terms of order l/N , the expected 
values of the second derivatives of the logarithm of the HY elihood function 
may be summarized in matrix form as follows 



a = -N“ 1 ||e(a 2 io g L/a?\.a?\.)|| = cz" 1 + z'Avz -1 , 

X ,"1 



B = - N ^|| e(S‘ i logL/S>.. bf. .)|| = Z 

i n i ) 



-1 A 



00 ‘ 



D E - if 1 || C^logL/dt. bf. )|| = | ||(o 1J )l , 

XX » I . I 



ijv2„ 



00 



( 2 ) 

( 3 ) 

00 



where N is the number of observations on which each a. . is based. 

10 

c E VZ _1 A , 



( 5 ) 



A is the diagonal matrix whose nonzero elements are (l - c)^ >. , and 
cr^ and are the e.lements of Z ^ and $ ^ respectively (:*.n this 

section, except for N , the number of observations, upper-case letters are 
used for matrices, lower-case unsubscripted Greek letters for vectors, and 
lower-case Roman or subscripted Greek letters for scalars). These three 
equations are readily found from JdJreskog's convenient summary ( 1969 , eqs . 
IT, 19 , 22). 

A 

Denote the maximum likelihood estimators of the parameters by , 

A 2 , ... , ^ , $ 1Jl , $ 22 , ... , ♦ m • The asymptotic variance- 
covariance matrix of these estimators is given by the inverse of the 




6 



partitioned matrix 



N 



A 

E’ 



B“ 

D_ 



This inverse can be found from the standard formula 



A B 


-1 


a'' 1 + a“ 1 bm" 1 ca" 1 


- a _l bm _1 




= 


-1 -1 


,,-1 


C D 




- M CA 


M 

_ 



where M : D ■ CA d 



A standard formula applied to (l) gives 



Z" 1 = S’” 1 - (l - c)kk' , 



where k E f _1 A . Similar ly, from (2), 



A -1 = c _1 (Z - | c -1 AV) 



From (7), we find that 

7 ~\ = (1 - c)k 



( 6 ) 



(7) 



( 8 ) 



(9) 



From (3) through (9), 



MED- B'A" 1 B= \ I IX 11 + f" 2 K?K^| 



where 



( 10 ) 



f E 







(ID 




I 
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x 11 = > 

g . = (f - 2 ^V.r 1 



(12) 

(15) 



Then, 



M = 2 (x - axrr'x) 



(14) 



2 

where X = ||X ii ll = W^H' 1 = IlffT-gJ is a diagonal matrix, 7 2 {k ± } , 
and d = (f 2 + y’Xy)" 1 . 



We have 



X . . 7 . = f?\ . g . 
li l i i 



(15) 



so that 



n 



ai.2,2,-1 



d = [r + f e g.(r-\) ] 

i=l 



( 16 ) 



Finally, after substituting (15) and (l6) into (l4), we have by (6) the 
asymptotic variances and covariances of the : 

covCti,^,,) = | (rt? lBi - «44i s ) • 



(IT) 



Similarly, after some algebra, 

cov(x i; r 3 ) = | (-2Vi;)8i + df Vj 8 i g j ) > 

Cov(X.,X ) = | It (1 + gi ) + | A.X.U - dFg^)] • 



( 18 ) 

( 19 ) 



O 
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x i:L = (f^igi)’ 1 > 

g. = (f - 2lff i V i )- ;i 



( 12 ) 

(15) 



Then, 



-1 



M = 2(X - dXyy'X) 



(14) 



where X = ||x || = llx 11 !!" 1 = Ilfl^gJI is a diagonal matrix, 7 = 



{# , 



2 -1 
and d = (f + 7^7) 



We have 



X. .7. = fX.g. 

11 1 i to i 



(15) 



so that 



n 



d = [f 2 + f z g.^xh 2 ]' 1 
i=l 



(16) 



Finally, after substituting (15) an( i (l 6 ) into (l4), we have by ( 6 ) the 
asymptotic variances and covariances of the : 



= I - dA^ 6i g 3 ) 

Similarly, after some algebra, 

- 1 <- 2 hVi + af£ Vj g i g j ) 



(IT) 



C°v(VV - | [*„(! + ®i) + I V4 1 ' df2g i g d )] 



(IS) 

(19) 
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The Role of n 



Now that these formulas are available, we can examine the role of n , 

the number of variables, in determining the sampling variances and co- 

variances. Let us assume that there is a uniform upper bound to all the 

a.. • Then it appears from (ll), (13), (l6) that 
11 

f is of order n , 
g^ is of order n ^ , 



d is of order n 



-1 



fg^ = 1 plus terms of order n 

Now, the second term in ( 17 ), also both terms in (l8), are of order 

n . Thus the sampling covariances between ft. and ft. , i / j , also 

i 0 

/s /\ /\ 

between ft. and A. or A. , vanish for large n . If we neglect terms 

l l J 



of order n 



-1 



Var ft. . = 2ft^./N 



11 11 



Var A. 4 (f.. + A^/2 )/n , 



A 



Cov(A. , A. ) = A.A./2N 
x i .i i y 



(i t i) 



( 20 ) 

( 21 ) 

( 22 ) 



4. Estimating the Covariances between Variables 

Under the one-factor model, the maximum likelihood estimators of the 
population variances and covariances are given by 



li 



; 2 

i 



N 



E AT + ?7. , = h 2 



li 



„ ^ (x . - X . ) 

N a=l 1 



(25) 
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O 

ERIC 



6 . . = A. A . if i J= j 
ij i 3 



(2h) 



Only the second formula gives a different estimator than would be appro- 
priate without the single-factor assumption. 

The asymptotic variance of is given by 

Var 8 = A 2 Var A 2 + A 2 Var A? + 2A.A. Cov(A.,A.) 

ij i 3 3 i i J 

By (21 ) and (22), omitting terms of order l/n , when i f j 

Var 8. . = ^ (A . + A 2 ^. . + 2A?A 2 ) 

1 3 N ' l jj j n i 3 



i (o 2 a 2 + o 2 . - t. .t .) 

N l j ij ii JJ 



( 25 ) 



Without the one-factor assumption, one would estimate o„ by the second 
bivariate sample moment 



1 N 

s. . E — Z (x. - x. )(x . -x.) , 

i 3 N a=1 ia i /v ja j ' 



(26) 



2 2 2 

which has an asymptotic variance of (°i°j + * Thus, for large n 

and N the use of the one -factor assumption decreases the sampling vari- 
ance of our estimate of a. . by the amount ty. .f. ,/N • 

ij ii JJ 

5. Estimating a Validity Coefficient of a Composite 



Suppose x n is a criterion variable of interest, and suppose that we 



are interested in using the "total score" 



n-1 

X = 2 x. 

i=i 3 
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to predict • The "validity coefficient" for the effectiveness of the 

total score for this purpose is the correlation 



n-1 

p(X, x ) = £ or. /a v a 

' 7 n' . , in' X nn 
1=1 



When the single -factor model holds, every term in the numerator can 
be estimated more accurately by (24) than by (26). This seems at first 
sight to guarantee a reduction in the sampling errors of the estimated 
validity coefficient, an important consideration when choosing among 
several predictors. 

Some algebra (not given here) shows, however, that the sampling vari- 
ance of the estimated numerator is the same whether the one -factor model 
is assumed or not. The same is true for the estimated p(X,x n ) • The 

reason is that although Var o\ . is smaller under (24) than under (26), 

10 

Cov(cr. .,ct , ) is larger under (24). Both Var a. . and Cov(cr. , ) 
v iy gh' v J 10 10 gh 

n-1 

are involved in the formulas for Var( £ ct. ) and Var p(X, x ) . 

v . , in' v ’ n' 

1=1 
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